[ML] RegressionIT: Fix hyperparameters for regression tests and unmute the test #135541

valeriy42 · 2025-09-26T13:50:45Z

This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous.

Closes #93228

…nmute the test

elasticsearchmachine · 2025-09-26T13:52:43Z

Pinging @elastic/ml-core (Team:ML)

DonalEvans · 2025-09-26T16:31:43Z

Would changing the test so that field_2 is 10 times field_1 instead of 2 times also help with making it more consistent, since it would make the difference between using the correct value and an incorrect one easier to spot?

The way we determine the prediction error also seems like it could be improved, since using the absolute value of the difference between the predicted and actual value is very forgiving for small numbers and very strict for large numbers (being allowed to be off by 3 when the predicted value is 2 is a huge margin of error compared to being allowed to be off by 3 when the predicted value is 600).

Perhaps instead of the absolute difference between predicted and actual, the relative difference could be used, since that would allow the final assertion to be that the predictions are on average within some % of the actual value, rather than being within some absolute range. Something like this perhaps (I chose 1% arbitrarily, I don't know how lenient the test actually needs to be):

int expectedValue = 2 * featureValue;
predictionErrorPercentSum += Math.abs(predictionValue - expectedValue) / expectedValue;
...
double meanPredictionErrorPercent = predictionErrorPercentSum / sourceData.getHits().getHits().length;
// Assert that the predicted values are on average within 1% of the expected values
assertThat(meanPredictionErrorPercent, lessThanOrEqualTo(0.01));

valeriy42 · 2025-09-29T09:54:59Z

Would changing the test so that field_2 is 10 times field_1 instead of 2 times also help with making it more consistent, since it would make the difference between using the correct value and an incorrect one easier to spot?

The way we determine the prediction error also seems like it could be improved, since using the absolute value of the difference between the predicted and actual value is very forgiving for small numbers and very strict for large numbers (being allowed to be off by 3 when the predicted value is 2 is a huge margin of error compared to being allowed to be off by 3 when the predicted value is 600).

Perhaps instead of the absolute difference between predicted and actual, the relative difference could be used, since that would allow the final assertion to be that the predictions are on average within some % of the actual value, rather than being within some absolute range. Something like this perhaps (I chose 1% arbitrarily, I don't know how lenient the test actually needs to be):
int expectedValue = 2 * featureValue;
predictionErrorPercentSum += Math.abs(predictionValue - expectedValue) / expectedValue;
...
double meanPredictionErrorPercent = predictionErrorPercentSum / sourceData.getHits().getHits().length;
// Assert that the predicted values are on average within 1% of the expected values
assertThat(meanPredictionErrorPercent, lessThanOrEqualTo(0.01));

@DonalEvans , thank you for your comment. This is a useful technique, and we employ similar assertions elsewhere in the test, where we actually verify that the algorithm is learning and predicting correct values. The goal of this test is to test that the alias fields are used correctly. Without fixing the hyperparameters on this test, the prediction was (on very rare occasions) completely off, and hence it would fail whether the assertion threshold is expressed in terms of relative or absolute values.

DonalEvans · 2025-09-29T14:41:16Z

@DonalEvans , thank you for your comment. This is a useful technique, and we employ similar assertions elsewhere in the test, where we actually verify that the algorithm is learning and predicting correct values. The goal of this test is to test that the alias fields are used correctly. Without fixing the hyperparameters on this test, the prediction was (on very rare occasions) completely off, and hence it would fail whether the assertion threshold is expressed in terms of relative or absolute values.

Thanks for the explanation!

elasticsearchmachine · 2025-10-01T12:50:18Z

💚 Backport successful

Status	Branch	Result
✅	8.19
✅	9.1
✅	8.18
✅	9.0

…e the test (elastic#135541) This PR fixes the flaky test muted in elastic#93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous. Closes elastic#93228

…e the test (#135541) (#135769) This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous. Closes #93228

…e the test (#135541) (#135770) This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous. Closes #93228

…e the test (#135541) (#135768) This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous. Closes #93228

…e the test (#135541) (#135771) This PR fixes the flaky test muted in #93228 by fixing hyperparameters to the values that always work. Since the test is for alias fields and not for the training algorithm, fixing the hyperparameters is not dangerous. Closes #93228

Refactor RegressionIT: Fix hyperparameters for regression tests and u…

8cb8d8e

…nmute the test

elasticsearchmachine added the v9.2.0 label Sep 26, 2025

valeriy42 added >test Issues or PRs that are addressing/adding tests :ml Machine learning Team:ML Meta label for the ML team labels Sep 26, 2025

valeriy42 marked this pull request as ready for review September 26, 2025 13:52

remove dead code

7d94e4f

valeriy42 added auto-backport Automatically create backport pull requests when merged v8.19.5 v9.1.5 v8.18.8 v9.0.8 labels Sep 26, 2025

Merge branch 'main' into tests/RegressionIT

dc97f2a

DonalEvans approved these changes Sep 29, 2025

View reviewed changes

Merge branch 'main' into tests/RegressionIT

94fed97

valeriy42 merged commit 58451ec into elastic:main Oct 1, 2025
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] RegressionIT: Fix hyperparameters for regression tests and unmute the test #135541

[ML] RegressionIT: Fix hyperparameters for regression tests and unmute the test #135541

Uh oh!

valeriy42 commented Sep 26, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Sep 26, 2025

Uh oh!

DonalEvans commented Sep 26, 2025 •

edited

Loading

Uh oh!

valeriy42 commented Sep 29, 2025

Uh oh!

DonalEvans commented Sep 29, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[ML] RegressionIT: Fix hyperparameters for regression tests and unmute the test #135541

[ML] RegressionIT: Fix hyperparameters for regression tests and unmute the test #135541

Uh oh!

Conversation

valeriy42 commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 26, 2025

Uh oh!

DonalEvans commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

valeriy42 commented Sep 29, 2025

Uh oh!

DonalEvans commented Sep 29, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 1, 2025

💚 Backport successful

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

valeriy42 commented Sep 26, 2025 •

edited

Loading

DonalEvans commented Sep 26, 2025 •

edited

Loading